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Abstract. Parallel tempering is a generic Markov chain Monte 
Carlo sampling method which allows good mixing with multimodal 
target distributions, where conventional Metropolis-Hastings algo- 
rithms often fail. The mixing properties of the sampler depend 
strongly on the choice of tuning parameters, such as the temper- 
ature schedule and the proposal distribution used for local explo- 
ration. We propose an adaptive algorithm which tunes both the 
temperature schedule and the parameters of the random- walk Me- 
tropolis kernel automatically. We prove the convergence of the 
adaptation and a strong law of large numbers for the algorithm. 
We illustrate the performance of our method with examples. Our 
empirical findings indicate that the algorithm can cope well with 
different kind of scenarios without prior tuning. 

1. Introduction 

Markov chain Monte Carlo (MCMC) is a generic method to approx- 
imate an integral of the form 

I = f / f{x)ir(x)dx, 

JR d 

where ir is a probability density function, which can be evaluated point- 
wise up to a normalising constant. Such an integral occurs frequently 
when computing Bayesian posterior expectations [e.g., 11, 24]. 



The random-walk Metropolis algorithm [21] works often well, pro- 
vided the target density n is, roughly speaking, sufficiently close to 
unimodal. The efficiency of the Metropolis algorithm can be opti- 
mised by a suitable choice of the proposal distribution. The proposal 
distribution can be chosen automatically by several adaptive MCMC 
algorithms; see 0, 0, Qjl, S] and references therein. 

When 7r has multiple well-separated modes, the random-walk based 
methods tend to get stuck to a single mode for long periods of time, 
which can lead to false convergence and severely erroneous results. Us- 
ing a tailored Metropolis-Hastings algorithm can help, but in many 
cases finding a good proposal distribution is difficult. Tempering of 
7r, that is, considering auxiliary distributions with density propor- 
tional to 7r^ with (5 e (0, 1) often provides better mixing within modes 



13l . l2fl l30l . l34j . We focus here particularly on the parallel tempering 
algorithm, which is also known as the replica exchange Monte Carlo 
and the Metropolis coupled Markov chain Monte Carlo. 
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The tempering approach is particularly tempting in such settings 
where 7r admits a physical interpretation, and there is good intuition 
how to choose the temperature schedule for the algorithm. In general, 
choosing the temperature schedule is a non-trivial task, but there are 
generic guidelines for temperature selection, based on both empirical 
findings and theoretical analysis. First rule of thumb suggests that 
the temperature progression should be (approximately) geometric; see, 
Kone and Kofke linked also the mean acceptance rate of the 
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e.g. 

swaps [17[; this has been further analysed by Atchade, Roberts and 
Rosenthal [EJ; see also (27| . 

Our temperature adaptation is based on the latter; we try to opti- 
mise the mean acceptance rate of the swaps between the chains in adja- 
cent temperatures. Our scheme has similarities with that proposed by 
Atchade, Roberts and Rosenthal (5)]. The key difference in our method 
is that we propose to adapt continuously during the simulation. We 
show that the temperature adaptation converges, and that the point of 
convergence is unique with mild and natural conditions on the target 
distribution. 

The local exploration in our approach relies on the random walk Me- 
tropolis algorithm. The proposal distribution, or more precisely, the 
scale/shape parameter of the proposal distribution, can be adapted 
using several existing techniques like the covariance estimator [12] aug- 
mented with an adaptive scaling pursuing a given mean acceptance rate 
[2I, M, 0, 26[ which is motivated by certain asymptotic results 25, 28 . 
It is also possible to use a robust shape estimate which enforces a given 
mean acceptance rate [331 ] . 

We start by describing the proposed algorithm in Section [2J The- 
oretical results on the convergence of the adaptation and the ergodic 
averages are given next in Section [31 In Section [U we illustrate the per- 
formance of the algorithm with examples. The proofs of the theoretical 
results are postponed to Section [5j 



2. Algorithm 

2.1. Parallel tempering algorithm. The parallel tempering algo- 
rithm defines a Markov chain over the product space X L , where Xcl d 

(1) X k = {x£\...,X™) = {X<M). 

(I) 

Each of the chains targets a 'tempered' version of the target dis- 
tribution it. Denote by (3 = (/3^ 1:L ^) the inverse temperature, which 
are such that 1 = /3« > > ■■■ > > 0. and by Z(p) the 
normalising constant 

(2) Z(/3) = J 7r^(x)dx , 
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which is assumed to be finite. The parallel tempering algorithm is 
constructed so that the Markov chain {X fc } fe > is reversible with respect 
to the product density 

pW( ft L )( (L)\ 



over the product space (X L , A' (g)L ). 

Each time-step may be decomposed into two successive moves: the 
swap move and the propagation (or update) move; for the latter, we 
consider only random-walk Metropolis moves. 

We use the following notation to distinguish the state of the algo- 
rithm after the swap step (denoted X n ) and after the random walk 
step, or equivalently after a complete step (denoted X n ). The state is 
then updated according to 

(4) X n _i — ^> X n _i -M 3 ' X n , 

where the two kernels M( S ,/?) and are respectively defined as 

• M(£ denotes the tensor product kernel on the product space 
X L 



(5) M^ t0) (x^ L hA 1 x...A L ) = l[M^ tPW) (x^,A t ) 

e=i 

where each M^i) is a random- walk Metropolis kernel tar- 
geting with increment distribution where is the 
density of a multivariate Gaussian with zero mean and covari- 
ance S, 

(6) M( Sj( g)(x, A) d = / a p (x,y)q i: (y - x)dy 

J A 

+ 5 X (A) J(l-ap(x,y))qx(y-x)dy, 

where 

(7) ap(x,y) = lA^\, for all (x, y) e X x X . 

In practical terms, M(s,/3) means that one applies a random- 
walk Metropolis step separately for each of the chains X^ v 
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denotes the Markov kernel of the swap steps, targeting the 
product distribution -k^ oc tt^ {1> <S) • • ■ <8> ir^ L , 

L-1 



3=1 



where cc7^ is the probability of accepting a swap between levels 
j and j + 1, which is given by 

/ ( U+D\\ /3 (j) -/3 (3+1) 

(9) 4 , (l ,,^, )SlA (^g_)) 

and 

(10) J^(x {j) ,x u+1) ;A) 
J ■■■ J 5 xU) (dy^)8 xU+1) (dj/W) ( d ^ W ) • 



def 



The above defined swap step means choosing a random index 
i G {1, . . . , L — 1} uniformly, and then proposing to swap the 
adjacent states, X^^ = X^_ 1 and X™_ 1 = ■ and ac- 

cepting this swap with probability given in 

2.2. Adaptive parallel tempering algorithm. In the adaptive ver- 
sion of the parallel tempering algorithm, the temperature parameters 
are continuously updated along the run of the algorithm. We denote 
the sequence of inverse temperatures 

(11) {/3nUo = {/3i 1:L) } n > 
which are parameterised by the vector-valued process 

(12) {P»}^o = {^- 1) }^ . 

through (3 { n ] = 1 and p® = /3 w (Pn :t ~ 1) ) for £ G {2, . . . , L} with 

(13) /3^ +1 )(p( 1:£ V= f /3W(P (1; '" 1) )exp(-exp(pW)). 

Because the inverse temperatures are adapted at each iteration, the 
target distribution of the chain changes from step to step as well. Our 
adaptation of the temperatures is performed using the following sto- 
chastic approximation procedure 



(14) p{? = n p (pW 1 + 7n4 ^W( P 23,x r 



1< t < L 
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where (pn' L ) is defined in ffT3l) . H p is the projection onto the con- 
straint set [p, p] , which will be discussed further in Section 12.41 More- 



over, 



a: 



(15) ^(^xj^A^^j 

(16) A/^V 1 ^) = /3 W (P (1 ^ 1} ) " /3 (£+1) (p (1: ' } ) • 

We will show in Section [3] that the algorithm is designed in such 
a way that the inverse temperatures converges to a value for which 
the mean probability of accepting a swap move between any adjacent- 
temperature chains is constant and is equal to a*. 

We will also adapt the random-walk proposal distribution at each 
level. We describe below another possible algorithm for performing 
such a task. In the theoretical part, for simplicity, we will consider only 
on with the seminal adaptive Metropolis algorithm [l2| augmented with 
scaling adaptation [e.g. 0, H, 26[. In this algorithm, we estimate the 
covariance matrix of the target distribution at each temperature and 
rescale it to control the acceptance ratio at each level in stationarity. 

Define by A4 + (d) the set of d x d positive definite matrices. For A G 
Ai + (d), we denote by g m i n (A) and g max (A) the smallest and the largest 
eigenvalues of A, respectively. For e G (0,1], define by Ai + (d, e) C 
Ai + (d) the convex subset 

(17) M + (d,e) = {£ G M + (d) : e < ftnin (S) < £w(£) < e' 1 } . 

The set Ai + (d, e) is a compact subset of the open cone of positive 
definite matrices. 

We denote by T n the current estimate of the covariance at level £, 
which is updated as follows 



r<? = n T 



n—lj 



(19) = (1 - lnM\ + 7n, 2 Xf 



where t(-) is the matrix transpose and Ilr is the projection on to the 
set A4 + (d, e); see Section [23 The scaling parameters is updated so 
that the acceptance rate in stationarity converges to the target a*, 



(20) Tf = n T + 7n , 3 



where ITt is is the projection onto [T, T\ ; see Section I2T41 and Yn is the 
proposal at level £, assumed to be conditionally independent from the 
past draws and distributed according to a Gaussian with mean X^ l _ 1 
and covariance matrix T^^-i which is given by 

(21) EP = exp(7*>)rCf>. 
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In the sequel we denote by Y n the vector of proposed moves at time 
step n, 

(22) {Y„} n > = {ri 1;L )} n > . 

2.3. Alternate random-walk adaptation. In order to reduce the 
number of parameters in the adaptation especially in higher dimen- 
sions, we propose to use a common covariance for all the temperatures, 
but still employ separate scaling. More specifically, 

L 

(23) r B = (1 - Tn^IV! + ?f ~ Vn-l)t{X^ _ ^ j 

1=1 
L 

(24) = (1 - Tn )2 )^n-1 + £ , 

1=1 

and set Y^n = exp(Tn)Y n . 

Another possible implementation of the random-walk adaption, ro- 
bust adaptive Metropolis (RAM) (33[, is defined by a single dynamic 
adjusting the covariance parameter and attaining a given acceptance 
rate. Specifically, one recursively finds a lower-diagonal matrix Yn G 
jgidxcz with positive diagonal satisfying 

(25) rf)t(lf)) = if! , [I + 7n , 2 (a n - a*)n(^)t( M (zW))] f(rg> ,) , 
where = f — -X'n-i an< ^ u ( x ) : = I 7^ 0} and let = 

r«t(r£>). 

The potential benefit of using this estimate instead of fTT8j) — ([20]) is 
that RAM finds, loosely speaking, a 'local' shape of the target distri- 
bution, which is often in practice close to a convex combination of the 
shapes of individual modes. In some situations, this proposal shape 
might allow better local exploration than the global covariance shape. 

2.4. Implementation details. In the experiments, we use the desired 
acceptance rate a* = 0.234 suggested by theoretical results for the 
swap kernel 0, 17] and for the random- walk Metropolis kernel 25, 28| 



We employ the step size sequences •y rii i = Cj(n + with constants 
Ci, c 3 G (0, oo) and c 2 G (0, 1] and £i, £ 2 , £3 £ (1/2, 1). This is a common 
choice in the stochastic approximation literature. 

The projections Il p , Il r and I1 T in ffT41) . (TT5|) and (J2"0]) . respectively, 
are used to enforce the stability of the adaptation process in order to 
simplify theoretical analysis of the algorithm. We have not observed 
instability empirically, and believe that the algorithm would be stable 
without projections; in fact, for the random-walk adaptation, there ex- 



ist some stability results [29|, |3l|, |32| . Therefore, we recommend setting 



the limits in the constraint sets as large as possible, within the limits 
of numerical accuracy. 
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It is possible to employ other strategies for proposing swaps of the 
tempered states. Specifically, it is possible to try more than one swap at 
each iteration, even go through all the temperatures, without changing 
the invariant distribution of the chain. We made some preliminary tests 
with other strategies, but the results were not promising, so we decided 
to keep the common approach of a single randomly chosen swap. 

In the temperature adaptation, it is also possible to enforce the geo- 
metric progression, and only adapt one parameter. More specifically, 
one can use p^h 1 := p n for all i G {1, . . . , L — 1} and perform the adapta- 
tion ( TT3|) to update p n _i — > p n . This strategy might induce more stable 
behaviour of the temperature parameter especially when the number of 
levels is high. On the other hand, it can be dangerous because the as- 
ymptotic acceptance probability across certain temperature levels can 
get low, inducing poor mixing. 

We consider only Gaussian proposal distributions in the random- 
walk Metropolis kernels. It is possible to employ also other proposals; 
in fact our theoretical results extend directly for example to the mul- 
tivariate Student proposal distributions. 

We note that the adaptive parallel tempering algorithm can be used 
also in a block-wise manner, or in Metropolis-within-Gibbs frame- 
work. More precisely, the adaptive random-walk chains can be run 
as Metropolis-within-Gibbs, and the state swapping can be done in the 
global level. This approach scales better with respect to the dimen- 
sion in many situations. Particularly, when the model is hierarchical, 
the structure of the model can allow significant computational savings. 
Finally, it is straightforward to extend the adaptive parallel tempering 
algorithm described above to general measure spaces. For the sake of 
exposition, we present the algorithm only in M. d . 

3. Theoretical results 

3.1. Formal definitions and assumptions. Denote by {Y n } the pro- 
posals of the random-walk Metropolis step. We define the following 
filtration 

(26) T n = a {X , (X fc , X fc _i, Y fc _i), k = 1, . . . , n, } . 

By construction, the covariance matrix S n = f (En ) and the parame- 
ters (3 n = f fin' L ^ are adapted to the filtration T n . With these notations 
and assumptions, for any time step n G N, 

P[X n+1 G .\T n ) = f S^(X n ,dz)M { ^ n) (z, ■) = $ p M iMn) (X n , •) 

Therefore, denoting P(s n)/ 3J = f S /3n M (Snii3n) , we get 

(27) E[/(X ri+1 )iJ n ]=P (Sn , /3ft) /(X n ), 

for all n G N and all bounded measurable functions / : X L — > R. 
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We will consider the following assumption on the target distribution, 
which ensures a geometric ergodicity of a random walk Metropolis chain 
[l|, [THj] . Below, | • | applied to a vector (or a matrix) stands for the 
Euclidean norm. 

(Al) The density n is bounded, bounded away from zero on compact 
sets, differentiable, such that 

x 

(28) lim sup — • Vlog7r(x) = — oo 

r -*°°\x\>r \ x \ 

, , , x Vtt(x) 

29 lim sup — • — , < 0. 

r ->°° \x\>r Fl |vvr(x)| 

In words, (A[T]) only requires that the target distribution is suffi- 
ciently regular, and the tails decay at a rate faster than exponential. 
We remark that the tempering approach is only well-defined when it 13 
are integrable with exponents of interest > — this is the case always 
with (ACE]). 

3.2. Geometric ergodicity and continuity of parallel temper- 
ing kernels. We first state and prove that the parallel tempering al- 
gorithm is geometrically ergodic under (A[T]) . This result might be 
of independent interest, because geometric ergodicity is well known to 
imply central limit theorems. 

We show that, under mild conditions, this kernel is phi-irreducible, 
strongly aperiodic, and ^-uniformly ergodic, where the function V is 
the sum of an appropriately chosen negative power of the target density. 
Specifically, for /3 G K+, consider the following drift function 

(30) V,(*( 1: V=X>(* (£) ), 

i=\ 

where for x G X, 

(31) V,(x) = (7r(x)/\\n\\J-^. 
For (3 > 0, define the set 

(32) K Po G (0,l] L ,/3 < /3 (L) < ■■■ < . 



We denote the V- variation of a signed measure /x as \\fi\\v '■= su P/ : |/|<y /•*(/); 
where the supremum is taken over all measurable functions /. The V- 

norm of a function is defined as = f sup^. \ f(x)\/V(x). 

Theorem 1. Assume ■ Let e > and /3q > 0. Then there exists 

C e ^ < oo and Q e> p < 1 such that, for all x G X L , S G A4 + (d, e) and 
(3 G JC/3 , 

(33) ||P( S ,/3)(x, •) - ir^l < C^qI^V^x). 
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Geometric ergodicity in turn implies the existence of a solution of 
the Poisson equation, and also provides bounds on the growth of this 
solution [22I, Chapter 17] 

Corollary 2. Assume (Alty ■ Let e > and /3q > 0. For any mea- 
surable function f with ||/|| V a < 00 for some a G (0, 1] there exists a 

unique (up to an additive constant) solution of the Poisson equation 
(34) 9-^,0)9 = f-Mf)- 

This solution is denoted f(s,/3)- I n addition, there exists a constant 
D e> p Q < 00 such that 



(35) ||/(s,/3)||v^ o < A,A)II.Mlv» j 

We will next establish that the parallel tempering kernel is locally 
Lipshitz continuous. For any > 0, denote by D V/3 [(12,(3), (12' , (3')} 
the V^-variation of the kernels P(s,/3) and Prs'^'), 

, r ||P(S,/3)(X, ■) - P (53 / j g')(x, -)|| 

(36) D V , [(12,(3), (12', (3')} d = f sup \ ^ . 

For (3q G (0, 1) and r\ > 0, define the set 

(37) ^ {/3 < /? (L) < • • < < 1 , /3 (fc) - P (k+1) > v} ■ 

Theorem 3. Assume (A~I\). Let e > 0, (3 > and rj > 0. For any a G 

(0,1], taere exists K e<a ^ 0)n < 00 suc/i £/iai, for any 12,12' G .M+(d, e) 
and any (3,(3' G JCp 0;V! it holds that 

D v?q [(12, (3), (12', (3')] < K e , aAhV {\(3-(3'\ + |S - . 

3.3. Strong law of large numbers. We can state an ergodicity re- 
sult for the adaptive parallel tempering algorithm, given the step size 
sequences satisfy the following natural condition. 

(A2) Assume that the step sizes {7 ni ,n G N}, % = 1,2,3 defined in 
(THlhflTgl) , and (|2"01) are non-negative and satisfy following condi- 
tions 

(i) For i = 1,2,3, Y, n >iln,i = 00 and J2 n >i n_1 7«.,i < 00 
00 sup neN 7„ i2 < 1 

Remark 4. It is easy to see that 7^ = Ci(n + l) - ^ with some c\, C3 G 
(0, 00) and c 2 G (0, 1] and 6,6,6 € (0, 1] satisfy (A[2j). 

Theorem 5. ^ssnme (^4QP - (AW and E[Vp (K )] < 00. Taen, for 
any function f : X L — > K swc/j £/mt ||/||v a < 00 / or some a £ (0, 1) 
and jiw en lim^oo 71-/3 (f) exists, we have 



1 n 

-y]/( x i) — ► lim K(3 n (f) a - s - 



n . 
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Remark 6. In practice, one is usually only interested in integrating 
with respect to it, which means functions / depending only on the first 
coordinate, that is, f(x^' L ^) = f(x^). In this case, the limit condition 
is trivial, because Ttp n (f) = 7r(/) for all n G N. 

3.4. Convergence of temperature adaptation. The strong law 
of large numbers (ITheorem 51) does not require the convergence of 
the inverse temperatures, if only the coolest chain is involved 
fIRemark 61) . It is, however, important to work out the convergence 
of the adaptation, because then we know what to expect on the as- 
ymptotic behaviour of the algorithm. Having the convergence, it is 
also possible to establish central limit theorems [l| ; however, we do not 
pursue it here. 

We denote the associated mean field of the stochastic approximation 
procedure (fill) by 

h( P )^ [^(pW),...,^)^ m > 

where 

h® (pW , . . . , p W ) J H «) {p W , . . . , p W , x)7r/3 (dx) . 
We may write 

'■'"// 

where Z(j3) is the normalising constant defined in 

The following result establishes the existence and uniqueness of the 
stable point of the adaptation. In words, the following result implies 
that there exist unique temperatures so that the mean rate of accepting 
proposed swaps is a*. 

Proposition 7. Assume (-$1$ ■ Then, there exists a unique p = f 
[pW, . . . , p( L_1 )] solution of the system of equations h^\p^\ . . . , p^) = 
0,£e{l,...,L-l}. 



Remark 8. In Proposition Proposition 7 it is sufficient to assume that 



the support of 7r has infinite Lebesgue measure and that J 7r K (x)dx < 
oo for all < k < 1; see ILemma 26l 

Remark 9. In case the support of 7r has a finite Lebesgue measure, it is 
not difficult to show that for a sufficiently large number of levels L > Lq 
there is no solution p. On the contrary, in formal terms, p^ = oo for 
£ > Lq, so that the corresponding inverse temperatures (3^ = for 
C- > Lq + 1. For our algorithm, this would imply that it simulates 
asymptotically ir°/Z(0), the uniform distribution on the support of n, 
with the levels £ > Lq + 1. 

For the convergence of the temperature adaptation, we require more 
stringent conditions on the step size sequence. 
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(A3) Assume that step sizes {7^,71 e N} defined in and 
( EU|) are non-negative and satisfy following conditions 

(0 En>l7n, t = 00, En>l7n,l < °°> 3nd E„>1 7n,l7nj < 00, 

J ="2,3. 
(ii) sup neN 7 n , 2 < 1 

(i'O En>l l7n+l,l -7n,l| < OO 

It is easy to check that the sequences introduced in IRemark 4l satisfy 
(AS]) if we assume in addition that £1,^2, £3 £ (1/2, 1]. 

Theorem 10. Assume (AM) > EV ( g (X ) < 00. In addition for 

all i = 1, . . . , L — 1 we assume that p < p™' < ~p, where p is given by 



Proposition 7 Then 



lim p n = p a.s. 



4. Experiments 



We consider two different type of examples: mixture of Gaussians 
in Section 14.11 and a challenging spatial imaging example in Section 
14.21 In all the experiments, we use the step size sequences 7 H) . = 
(n + l) -0 ' 6 , except for RAM adaptation, where 7„ i2 = min{0.9,(i • 
(n + 1)~ 06 } (see 33| for a discussion). We did not observe numerical 



instability issues, so the adaptations were not enforced to be stable 
by projections. We used the following initial values for the adapted 
parameters: temperature difference p^ = 1, covariances Eg = / and 
scalings 9q = 1. 

4.1. Mixture of Gaussians. We consider first a well-known two- 
dimensional mixture of Gaussians example [e.g. 0, [l9|. The example 
consists of 20 mixture components with means in [0, 10] 2 and each 
component has a diagonal covariance a 2 I, with a 2 = 0.01. Figure 1 
shows an example of the points simulated by our parallel tempering 
algorithm in this example, when we use L = 5 energy levels and the 
default (covariance) adaptation to adjust the random walk proposals. 



Figure 2] shows the convergence of the temperature parameters in the 
same example. 

We computed estimates of the means and the squares of the coor- 
dinates with iV = 5000 iterations of which 2500 burn-in, and show 
the mean and standard deviation (in parenthesis) over 100 runs of our 
parallel tempering algorithm in lTableTl We considered three different 
random- walk adaptations: the default adaptation in (|18p - (|20p (Cov), 
with common mean and covariance estimators as defined in (123]) - ([24")) 
(Cov(g)) and the RAM update defined in ([275]) . ITable II shows the re- 
sults in the same form as [7|, Table 3] to allow easy comparison. When 
comparing with Q, our results show smaller deviation than the un- 
adapted parallel tempering, but bigger deviation than their samplers 
including also equi-energy moves. We remark that we did not adjust 
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level 1 



level 2 



level 3 



level 4 



level 5 




-5 5 1015 -5 5 1015 -5 5 1015 -5 5 1015 -5 5 1015 



Figure 1. Simulated points of the tempered distribu- 
tions over 5000 iterations. The random-walk proposal is 
illustrated as an ellipsoid. 
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500 1000 1500 2000 2500 3000 3500 4000 4500 5000 

Figure 2. Convergence of the log-temperatures in the 
mixture example. The 10%-90% quantiles over 100 runs 
of the algorithm are drawn in grey and the black line 
shows the median. 



Table 1. The test of Q, Table 3 case (A) with L = 5 
temperature levels, 5000 iterations and 2500 burn-in. 





E[Xi] 


E[X 2 ] 


E[Xf] 


E[Xi) 


True value 


4.478 


4.905 


25.605 


33.920 


Cov 


4.469 (0.588) 


4.950 (0.813) 


25.329 (5.639) 


34.209 (8.106) 


Cov(g) 


4.389 (0.537) 


4.803 (0.692) 


24.677 (5.411) 


32.865 (6.660) 


RAM 


4.530 (0.524) 


4.946 (0.811) 


26.111 (5.308) 


34.331 (8.292) 



our algorithm at all for the example, but let the adaptation take care 
of that. There are no significant differences between the random- walk 
adaptation algorithms. 



When looking the simulated points in Figure 1 it is clear that three 



temperature levels is enough to allow good mixing in the above exam- 
ple. We repeated the example with L = 3 energy levels, and increased 
the number of iterations to N = 8333 in order to have a comparable 
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Table 2. The test of p), Table 3 case (A) with L = 3 
temperature levels, 8333 iterations and 4167 burn-in. 





E[Xi] 


E[X 2 ] 


E[Xf] 


E[X*\ 


True value 


4.478 


4.905 


25.605 


33.920 


Cov 


4.480 (0.416) 


4.957 (0.571) 


25.542 (4.164) 


34.420 (5.669) 


Cov(g) 


4.488 (0.422) 


4.884 (0.551) 


25.719 (4.190) 


33.520 (5.476) 


RAM 


4.490 (0.407) 


4.881 (0.541) 


25.667 (4.281) 


33.622 (5.631) 



Table 3. The root mean square errors in the modified 
mixture example with a 2 = 0.001, d — 8 and L = 8. 



N 


Est. 




Cov 


Cov(g) 


RAM 


10k 


E[X] 




3.080 


2.245 


1.660 


E[\X 


2 ] 


27.577 


20.426 


16.428 


20k 


E[X] 




1.788 


1.580 


1.429 


E[\X 


2 ] 


18.577 


15.712 


14.475 


40k 


E[X] 




1.439 


1.267 


0.952 


E[\X 


2 ] 


15.471 


12.769 


9.364 


80k 


E[X] 




1.257 


0.975 


0.698 


E[\X 


2 ] 


13.017 


9.414 


6.981 


160k 


E[X] 




1.096 


0.680 


0.508 


E[\X 


2 ] 


11.093 


7.038 


5.122 



computational cost. The summary of the results in lTable "V indicates in- 
creased accuracy than with L = 5 levels, and the accuracy comes close 
to the results reported in 0] for samplers with equi-energy moves. 

We considered also a more difficult modification of the mixture ex- 
ample above. We decreased the variance of the mixture components 
to a 2 = 0.001 and increased the dimension to d = 8. The mixture 
means of the added coordinates were all zero. We ran our adaptive 
parallel tempering algorithm in this case with L = 8 temperature lev- 
els; ITable 31 summarises the results with different number of iterations. 
In all the cases, the first half of the iterations were burn-in. In this sce- 
nario, the different random-walk adaptation algorithms have slightly 
different behaviour. Particularly, the common mean and covariance es- 
timates (Cov(g)) seem to improve over the separate covariances (Cov). 
The RAM approach seems to provide the most accurate results. How- 
ever, we believe that this is probably due to the special properties of 
the example, namely the fact that all the mixture components have a 
common covariance, and the RAM converges close to this in the first 
level; see the discussion in (33^ . 
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4.2. Spatial imaging. As another example, we consider identifying 
ice floes from polar satellite images as described by Banfield and Raftery 
The image under consideration is a 200 by 200 gray-scale satellite im- 
age, and we focus on the same 40 by 40 subregion as in The goal 
is to identify the presence and position of polar ice floes. Towards this 
goal, Higdon [lij employs a Bayesian model with an Ising model prior 
and following posterior distribution on {0, l} 40x40 ; 

log(vr(x|?/)) OC ^ a H X i,j = Vi,j} + = X i'j'} ' 

1<U,<40 

where neighbourhood relation (~) is defined as vertical, horizontal and 
diagonal adjacencies of each pixel. Posterior distribution favours x 
which are similar to original image y (first term) and for which the 
neighbouring pixels are equal (second term). 



In [14| and [8|, the authors observed that standard MCMC algo- 
rithms which propose to flip one pixel at a time fail to explore the 
modes of the posterior. There are, however, some advantages of using 
such an algorithm, given we can overcome the difficulty in mixing be- 
tween the modes. Specifically, in order to compute (the log-difference 
of) the unnormalised density values, we need only to explore the neigh- 
bourhoods of the pixels that have changed. Therefore, the proposal 
with one pixel flip at a time has a low computational cost. Moreover, 
such an algorithm is easy to implement. 

We used our parallel tempering algorithm with the above mentioned 
proposal with L = 10 temperature levels to simulate the posterior of 
this model with parameters a = 1 and — 0.7. We ran 100 replications 
of iV = 100000 iterations of the algorithm. Obtained result are shown 
in Figure 3 is similar to [14J and [8(. We emphasize again that our 



algorithm provided good results without any prior tuning. 



5. Proofs 

5.1. Proof of ITheorem 11 The proof follows by arguments in the lit- 
erature that guarantee a geometric ergodicity for the individual random- 
walk Metropolis kernels, and by observing that the swap kernel is in- 
variant under permutation-invariant functions. 

We start with an easy lemma showing that a drift in cooler chain 
implies a drift in the higher-temperature chain. 

Lemma 11. Consider the drift function W == C7r _K for some positive 
constants k and c. Then, for any X e M.Ad), 

< p> M { ^p>)W{x) < Mp,p)W(x) , for allxeX . 
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FIGURE 3. Spatial model example: original image (left), 
posterior estimate based on 100 replications of adaptive 
parallel tempering (right) 



Proof. We write 



W(x) 



+ 



{y ■ n(y)>n(x)} 

1 - 



{y ■ ir(y)<n(x)} 



7T{X) 
7t(x)J 



+ 



n{v) \ 

7T(x)J 



B-K 



qx(x-y)dy 



First term is independent on (3, since (3 i— > 1 — a 13 + a 13 K for a e [0, 1] is 
non-increasing the second term is also non-increasing with respect to 
(5. □ 

To control the ergodicity of each individual random-walk Metropolis 
sampler, it is required to have a control on the minorisation and drift 
constants for the kernels Mrzp). The following proposition provides 
such explicit control. 

Lemma 12. Assume (-$J\) ■ Let e > and > 0. There exist 
X £)/ 3 € [0, 1) and b £ ^ < oo such that for any £ G Ai + (d, e), we get 



(38) 



Mi 



V B < K,bV b + b 



where Vp == (ir/ 11^(1^) ^ 2 , where 



71 



Proof. It is easily seen that if the target distribution is super-exponential 
in the tails (A[JJ) , then all the tempered versions tt 13 /Z((3), where the 
normalising constant Z{J$) is defined in (|2J), satisfy (A[JJ as well. 

The result then follows from Andrieu and Moulines [l|, Proposi- 
tion 12]. □ 
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Proposition 13. Assume f^CZP and let e > and j3 > 0. Then, there 
exists X e ,p a < 1; an d b e /3o < oo such that, for all £ G .M + (d, e) and 
/3 G K.p a , 

(39) M (Sj/3) V A < A eA V A + 6 e ,/3 , 

(40) S^V A) = V A , 

(41) P(S,/3)V ft) < A^Vft + 6 e ,/3 , 

Proof. Bv lLemma 11[ since f3 G /C^, we get 

L 

(42) Mpj^Vft,^) = E M (sW ^)%(*W) 

£=1 
L 

<E m (^'a)^o(^)- 

Then, by lLemma 12[ since S G A^ + (d, e), it holds 

L L 

E v^a( sW ) ^ A ^o E + • 

£=1 £=1 

Thanks to the definition of the swapping kernel f[8l)- ffT0]) . for any pos- 
itive measurable function F : X L — > R + which is invariant by permu- 
tatiorQ, we get SpF(x( 1:L ^) = F(x^ 1:L ^). Since the drift function 
defined in fl3"0"|) is invariant by permutation we obtain 

Sp [K^V^x^) + Lb e ,p ] = X eA S V Po (x^) + Lb e , Po 

= X etPo Vp (x^) + Lb eJh . □ 

Proposition 14. Assume (J^) . Let e > 0, (5q > and r > 1 ; and 

consider the level set C r = f {x G X L : V j g (x) < r}. Tnere exisis a 
constant o~ r ,e,p > swc ^ / or a ^ ^ e -M+(^> e ) an( ^ /3 G /C^ , ine 
set C r is a (l,S rie ^ ,u r )-small set for P(s,/3), is> 

(43) P( S)/3 )(x, •) > fip, e ,^,i/ r , e ,^(-) x G C r , 

where v T {-) = A Lcb (- fl C r )/A Leb (C r ) is a probability measure on C r and 
A Leb stands for the Lebesgue measure. 

Proof. It is easy to see that the set C r is absorbing for because 
SpVp(x) = Vp{x) as observed in the proof of proposition 14[ implying 
Sp(x, C r ) = 1 for all x G C r . Hence for x G C r 

(44) P( Sj/3 )(x, A) > / f[ fl A feW (yW-^)dy^ , 

Jc r nA e=1 V 7r l a ' v J/ 

x F(x (1:L) ) = F(x (o " (1):<T(i)) ) for any (x ( - 1:L ' ) ) 6 X L and any permutation cr over 
the set {!,...,£} 
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where gs is the multivariate Gaussian density with zero mean and 
covariance E. Since the set C r is compact and £ G Ai^(f3 ,e), there 
exists a constant C r>e ^ > such that for any £ = 1, . . . , L 

t inf q^(y^-x^)>C r ^ . 
Therefore, by (|44|) and since (3^ G (0, 1], we obtain for x G C r 

If y = g C r , we get (vr(?/ w )/ lklloo) _A ' < r / L for a11 £ e 

{1,...,L}, which implies that (L/r) 2 ^ ||7r|| < 7r(y) . Hence, for all 
G C r x C r , ix{y)/-n(x) < (L/r) 2 ^ , showing that 

P ( ^)(x,A) >C^ [(lA(L/r))^fA Lcb (a)A^ b (C r nA), 

where A^ b (-) = A Lcb (C r n -)/A Lcb (C r ). □ 

Proof of \Theorem 1\ Choose a sufficiently large r > 1 so that there 
exists a \ t ^ < 1 such that 



Pp,fDVpa(x) < K,fhV Po (x) + 1 {x i C} b, 



'e,/3o 



by Proposition 13 


where C r is defined in Proposition 14 


This drift 


inequality, with the minorisation inequality in 


Proposition 14 imply 



^-uniform ergodicity ( 133]) with constants depending only on X e ^ < 1, 
b e ^ and 5 r , e)/3o [e.g. |23j. □ 

5.2. Proof of ITheorem 31 We preface the proof of this Theorem by 
several technical lemmas. 

Lemma 15. For all (/3,/3') G (0, l) 2 we have 

SUp \zP-Z?\< ^nW'-Pl- 

ze[o,i] m&x{(3, fj'\ 

Proof. Without loss of generality, assume that < (3 < f3' < 1. The 
function w : [0, 1] — > K defined as w(z) = z 13 — z 13 is continuous, 
non- negative and w(0) = w(l) = 0. Therefore, the maximum of this 
function is obtained inside the interval (0,1). By computing the de- 
rivative w'{z) = (3z l3 ~ 1 — (3'z 13 _1 and setting w'{z) = 0, we find the 

maximum at z = {-p ) , so 



ze[o,i] 



P'J V P'J ~ V P'J P 
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Lemma 16. Set j3 G (0,1]. There exists a constant Kp < oo such 
that for any (8,8') G [/?o, 1] 2 ; any covariance matrix E G M. + (d), 



(45) / {Vfriy) + V /3o {x))\ap{x,y) - a^{x,y)\q^{y - x)X Lch {dy) 
Jx 

< K^P - PlVkix) . 
In addition, for any measurable function g with \\g\\ v < 1, 

(46) \M {StP) g(x) - Mp,fy )g {x)\ < K Po \8 - 8'\V Po {x) . 

Proof. Without loss of generality, we assume that 8 < 8' . Recall that 
V Po (y) oc Tr~M 2 (y). Note that 



v Po(y) \ a fs{ x iV) - <xp'{x,y)\qv{y - x)dy 



V/3 (x) 



ir(x) 



P-Po/2 



7T(x)J 



P'-Po/2 



q E (y-x)dy. 



where R x := {y G X : Tt(y) < tt(x)}, and 



Vh(x) / \ap(x,y) -ap(x,y)\qz(y-x)dy 

'X 



7r(:c) 



7r(x) 



0' 



Using [Lemma 151 we get 



itlx) 



It 



(2/) 



7T (X, 



QE(y-x)dy < 



qx(y-x)dy. 



8 - 



and similarly 



Ay) 

ir(x) 
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(y) 



TtlX, 



which concludes the proof of (j45J). 
We consider now fl46l). Note that 



qvty-x)dy<-\8'-P\ 



(47) \Mp,fi)g{x) - Mp,p, )g {x)\ 

= I (9(y)~ 9{x)){a p (x,y)-a^(x,y))q J] (y-x)dy 
Jx 

< / \g{y) ~ 9{x)\\aip{x,y) - ap(x,y)\q-z{y - x)dy 
Jx 

< / (Vp (y) + Vp (x)) \ap(x,y) - a p >{x,y)\q^{y - x)dy 



and we conclude using (1451) . 



□ 
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The following result is a restatement of |l|, Proposition 12 and (the 
proof of) Lemma 13]. 

Lemma 17. For any e > there exists K e < oo such that for any 
(£, £') G Ai + (d, e), (3 G [0, 1], and function g with WgWv^ < 1? we have 

\Mptfg{x) - Mpr tfi) g{x)\ < K e \E - S'|^ (x) . 
In addition there exists K q < oo such that 

(48) / \qx(z) - gjy(z)|dz < K q \E - E'|. 

ILemma Tfl can be generalised also for non-Gaussian proposal distri- 
butions, including the multivariate Student [3l|, Appendix B]. 

Now we show the local Lipschitz-continuity of the mapping (£, (3) y 

M (S,/3). 

Proposition 18. Assume ($3$ ■ Let e > and (3q > 0, and r] > 0. 

There exists a constant K t ^ 0tV < oo such that such that for all E, £' G 
M.+(d,e), (3,(3' G K-fatf, and functions g with \\g\\ Wfj < 1, we have 

|M (S)/3) ^(x) - M (s - i/3 ^(x)| < K eA ,,{|/3 - + |S - V A (x) . 

Remark 19. In this proof, it is possible to set rj = 0. The use of r] > 
is required in the proof of continuity. 

Proof. We may write 

(49) |M (Si/3) £(x) - M (5y>/30 2(x)| < |M (S)/3) #(x) - M (S)/30 ^(x)| 

+ |M (S)/305 (x) - M (s , i/30 s(x)| = i?i(x) + R 2 (x) . 

First, we consider Ri. We prove by induction that there exists a con- 
stant Kk e Bn n < oo such that, for all measurable g such that ||o|| v < 

' ' II II V ^ 

< ifjfc je ,^|^-^'|V / j (x), 
where = Ui<e<k ^(53(0,^(0), and 

M(EW,^))(^ (1:i) ,^i x-x4) = M (EWi/3W) (x w ,^) 



(50) 



M 



(fc) 

(S,/3) 



S(x) - M|^^(x 
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We first establish the result for k = 1. For any £ G {1, . . . , L}, we get 

|^(£W,/3W)#( X ) ~ ^(E(Oj9'«))fl r (x)| 

\ap(t)(x (e) ,y) - ap,( e )(x {e \y)\qx( e )(y - x (£) )d?/ 



< 



x 



\ap W (x®,y) - a^e)(x^\y)\q E (e)(y - x {e) )dy 

Applying ( T45|) . there exists K < oo such that for any t G {1, . . . , L}, 

we get 



(51) |M (sW)/3W) s(x) -M (sW) ^)^(x)| < -/3'W|^^ o (^)) 

it=i 

Taking £ = 1 establishes f )50p with ft = 1. Assume now that (I50p is 
satisfied for some ft G {2, . . . , L — 1}. We have 

|M^(x)-M[^(x)| 

< I (M^tfc+i^fc+ij) - M^^+ij^tfc+i))) M^^(x)| 
+ |M (E ( fc+1 ) i/3 „ fc+1)) (M[g i/3) - Mg^^Cx)! 

For any ||<7|| v ^ < 1, we have 

k L 

|Mg >/3)ff (x)| < Mg^V^x) = ^M (sW ^ W) y ft (x»)+ £ 

£=1 £=fc+l 

ILemma 121 implies that, for any 1 1 (7 1 1 v < 1, 



M 



s,/3)^( x )| < V ft (x) + /c6 eA 



showing that ||M^ ^gUv^ < 00. Hence by fl5Tj) there exists a constant 
(i) 



^+1,6^0^ < 00 such that 



(M( E (fc+i) j/3 (fc+i)) — M ( - S (fc+i) )/3 /(fc+i)))M|^^5f(x 



< 



<i^o,J/3 (/£+1) -/3 /(fc+1) |V ft) (x). 
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By the induction assumption fl50|) and lLemma 12l there exists K^_ 1 e g Q < 
oo such that second term is bounded by 

|M (E ( fc+ i)^( fc+ i )) (Mg )/3) - m[^ ;) )^(x)| 

This shows that (l50i) is satisfied for k + 1. Carrying out the induction 
until k = L, there exists a constant ifi, eil g 0)J7 < oo such that, for all 

Nlv, <i, 

(52) = |M (S)/3)5 (x) - M (S)/3 ^(x)| < - /3'|V A) (x) , 



where i?i is defined in fj49|) . 

Consider now _R 2 - It is easy to see that by (|48j) we obtain analogous 
formula to (!5T|) ; that is, with the same temperatures f3 but different 
covariance matrices XI and XI'. The proof is concluded by using the 
same induction proof as for the term R x . □ 

Lemma 20. Let /3 > and r) > 0. Then, there exists a constant Kp 0)V 
such that, for any (3, (3' G /C/3 0jJ? and for any measurable function g with 
IMIv^ — 1> ^ holds that 

|S^(x) - S^(x)| < K^IP - /3'|V A) (x) . 
Proof. Using the definition ([8]) of Sp, we get 

(53) 

|S^(x)-S^(x)| = T^ll E (t*?(* W ,* ( < +1) ) - ^(i®^) 

£=1 

X C„C T (1:<-1) T (M-1) T (<) T (L)\ _ / (1:€-1) (€) (L)^ 

By it holds that ^(iW,!^ 1 )) = = 1 whenever 

7r(x^ +1 - ) ) > 7r(x^). Therefore, using [Lemma 15[ we get that 

(54) 

l4VV (m) )-4'V^ (m) 



i i — i i \ / / i (■ -i- i i \ \ 

7T 



^-{T7(x( e + 1 ))<TT(xW)} 



< 



7t(xW) J V 

Ifld+i) -^+1)1 + I^W-^ffll 



Since /3,/3' G max{(/3W - , - > 77 for all 

£ G {!,..., L}. Because ||p|| Vj8 — 1 an d V^ are invariant under 
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permutations, we have by (155|) and (I51j) 

4 



|S^(x) - S^(x)| < ^3Tj^ V *( x ) E 1^ - ^1 ' D 

Now we are ready to conclude with the continuity of the parallel 
tempering kernels. 

Proof of \Theorem 31 The definition fl30l) of implies that, for any 
ae (0,1], 



L a ~ l E *w* w ) < E v *>(?®) ^ E ^ 



i=l \ i=l / i=l 

A) 

•|| v p are equivak 

a = 1. Write 

|P(E,/3)0(x) - P (S ' )/30 ^(x)| < Ti(x) + T 2 (x 



showing that L a 1 V Q , /3o (x) < V2 (x) < V a/ g (x). Therefore, the norms 
||-|| V a and ||-|| v ^ are equivalent. It suffices to prove the results with 



where 



7i(x) A ^ |S^(M (Si/3) - Mpw)g{x) 

T 2 (x) = f \(Sp - S /3 /)M( S / >i g/)(/(x)| . 



By |Proposition 181 we obtain 

Tx(x) < ^ 0)J7 {|^-/3'| + |S - E / |}S jB V /!b (x) 

<lf eA , )? {|i9-/3 , | + |S-S , |}V /Jb (x). 

By fl39|) of lLemma 121 we obtain that ||M(53 / , j 9')fl , || Vj8 < C- Hence, by 
ILemma 20l we get that 

T 2 (x)<CKp 0)V \(3-l3'\Vp (x), 

which concludes the proof. □ 

5.3. Proof of ITheorem 51 We now turn into the proof of the strong 
law of large numbers. We start by gathering some known results and 
by technical lemmas. 

Lemma 21. Assume (-A[J\) and that, in addition, E[V / g (Xo)] < oo. 
Then, 

supE[V^ ) (X n )] < oo, 

n>l 

where X n is the state of the adaptive parallel tempering algorithm de- 
fined in (jll) . 
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Proof. Under (AdJ , by Proposition 13 for all n G N we have that 



(55) P^fiJVpo^X^Vpo+b. 
Iterating fl55|) . by ff27|) we obtain 

E[V ft (X n )] = E [E [Vft,(X n ) | Fn-r)] = E [P^^V^X^)] 

<A £)/3o E[V /3o (X n _ 1 )] + 6 £A . 
By iterating this majorisation, we get recursively 

n 

E[V A (X n )] < \: A E [V ft (Xo)] + b"£ 

fc=i 

< A e , A) E [V ft) (X )]+ 6 . 

Because the term on the right is independent from n and since, by 
assumption, EV/3 (X ) < oo, the proof is concluded. □ 

The following Lemma is adapted from [lCj, Lemma 4.2]. 

Lemma 22. Assume (-^J\) ■ Let e > ; f3o > 0, r] > 0. For any 

G IC eAhV d = M+{d,e) x /C A)ir , ; let : X L -> 1+ 6e a mea- 

surable function such that 

(56) sup ll^s^H <+oo. 

..... v fj Q 



Define Frs,0) i/ie solution of the Poisson equation (see Corollary 2 ) 

F(Jt,fi) = 5^ P (S,/3){ F (S,/3) - 7r/3(-^(S,/3))} • 

n>0 

T/iere exzsi a constant L e ^ < oo sitc/j that, for any (S,/3), (S',/3') G 

(57) - tt^II^ < L £iA D V/3o [(S,/3), , 
and 

(58) ||P(S,/3)F (Si/3) - P(s',/3')^(S',/3')llv 3o 

< L e,/3 | ||i r (S,/3) - ifc'./Jollv,,, 

+ sup ||F (S)/3) || Dv A ,[(S > j9),(S , ) /30]}. 

We use repeatedly the following elementary result on a projection to 
a closed convex set. 

Lemma 23. Let E 6e an Euclidean space. Given any nonempty closed 
convex set K C E, denote by LTk the projection on the set K. For any 
(x, y) G E x E, ||I1k(x) — LIk(2/) || < — where \\ ■ \\ is the Euclidean 
norm. 
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Lemma 24. Assume (dUty and sup ngN 7 nj 2 < 1, where 7 Wj 2 is defined 
in (TT~5|) and ( 1T9"|) . T/ien, /or all K > and a G [0,1], t/iere exists 
constant K KiO ,^p ^ < oo suc/i for any n G N it holds 



Dv 0O ( S "+l'/3n+l 



V£ (X n+1 )+^a n ,*V£ (X,) 



fc=0 



w/iere 7„ = £)f =1 7n)i , and a n , n = f 7 n , 2 and /or A; G {0, . . . , n - 1} 



(59) 



a n ,k = Jk,2 JJ (1 - 7j,2) • 



Proof. According to ITheorem 31 under (A[TJ) , there exists a constant 

-Ke,a,/3 ,r) SUCn that 

^v« o [(£„,/3J, (S n _i, < K e>a ^ 0>v {\(3 n - /3„ +1 | + |S n - S„+i|} • 



For any £ G {1, . . . , L— 1} consider |p^L — Pr5*|, where {p^ n } is defined 



by tfH]). Since, by_^J^W(p( 1:£ ),a;)| < 1 for any p 
iGX, applying ILemma 23l we obtain 

(60) |pS 1 -pi £) |<7n i i|^ ) (pi-i ) ,X„)|<7„,i. 
Define the function 



and 



where the functions f3^ ' are defined in (I13p . The function (3 is contin- 
uously differentiable. By definition (1T4"]) . for all n G N, p n G [p, p] . 
Hence (160]) implies that there exists K < oo such that 



(61) |/3„ - (3 n+1 \ = |/3(pJ - /3( Pn+1 )| < X 7 n,i • 

Now consider |£ n — £ n +i|. By definition (121]) we get 

I s ? - ES3.il < ex P (Tf )|r<? - rgj + | exp(Tf ) - exp^)^ J 
The scale adaptation procedure (12"0]) by ILemma 231 satisfies 

\T<P -T f 2 1 |< 7n+li3 , 



which implies that there exists constant K < oo such that | exp(T, 



exp(T„ H : 1 )| < i^7 n+13 . By (]T8]) and ILemma 231 we obtain 

I n+ll — 7n+l,2 A^n JK^n+l Pn / ^ 1 n 



< 



7n+l,2 



2|x£ 1 | 2 + 2|pW| 2 + |r 



n I 
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By definition ( ITS!) and (|2"U|) exp(T n ) and |T n | are uniformly bounded for 
all n G N. Hence, gathering all terms, there exists a constant K < oo 
such that 

% [(£„,/U (£„-i, /3 n _ x )] <^ 7 „ +1 [iX^p + l/iWl 2 " , 
where 7 n = X)i=iTM- Under assumption sup ngN 7 ni2 < 1, by (JTHD, 



we get that for any n G N /in = X]fc=o a n,k-^-n}-ki w here the positive 
weights a n fc , A; G {0, . . . , n} are defined in (1591) . Because ^)L a ni £ = 1, 
the Jensen inequality implies 

2 



-W i 1 ^ . i Y (£) I 2 

^n-fcl 

,fc=0 / fc=0 



Finally, under (AHJ for any k > there exists K K such that, for all 
xgX l , |x| 2 <K k V^ o (x). ' □ 

Lemma 25. Assume o,nd sup„ eN 7 n) 2 < 1. For any non-negative 

sequence {6„}„> satisfying J2 n >i fo «(7n,i + 7n,2 + 7*1,3) < 00 and for all 
a G (0, 1), we /iai>e 

00 

J^Dv^ [(E„,/3 n ), (S n -i,^n-i)] V^ (X n ) < +00 a.s. . 

n=l 

Proof. Since sup n6N 7 ni 2 < 1, under (A[T]) by ILemma 241 for all k > 
there exists K < 00 such that 



n 

v£ (x n+ o + $> n , fe v£ (x fe ) 



^0 

< i^7„ + i 



fc=0 

where 7 n = $^ =1 7»M an< ^ ^he triangular array {a n ,fc}Lo ^ s defined in 
(159|) . Set k = 1 — a. Since Xm>i ^™7™ < 00 > ^ * s enou gh to show that 



A d ^ supE I ^Vj- Q (X„, +1 ) + J2a n , k V^ a (X k )j V^ (X n+1 ) j < 00 . 

By Holder inequality we get for all k, n G N 

E{Vj: Q (X fc )V« (X„ +1 )} < {EV ft) (X„ +1 )} a {EV^X*)} 1 -" . 

Since the weights a n fc are non-negative and Y^I=q a n ,fc = 1, we get A < 
2sup ngN EV i a (X n ) the proof is concluded by applying ILemma 2T1 □ 

Proof of \Theorem 51 According to 10|, Corollary 2.9], we need to check 
that (i) sup n>0 E[V(X„)] < +00, and (ii) there exists a G (0,1) such 
that 

00 

E^ 1 ^ [(E n ,/3 n ),(E n _!, £„_!)] V^ o (X n ) < +00 a.s., 
n=i 
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which follow from ILemma 21l and lLemma 25} respectively. 



□ 



5.4. Proof of Proposition 7, In order to prove the existence and 
uniqueness of the root of the mean field h, we first introduce some 
notation and prove the key ILemma 26l 

Below, we omit the set X from the integrals. Let us define a sym- 
metric function h : (0, l] 2 — > [0, 1] as follows 

7T u (x)7r v (y)\ 7T v (dx) ii u (dy) 



(62) h(u,v) 
Note that for all (u,v) G (0, l] 2 , we get 

(63) h(u,v) = 



■K u (y)n v (x) J Z{v) Z(u) 



ir u (v) 



Z u) 



where for v G (0,1], 



(64) 



fv(z) 



dcf 



7T 



'(cto) 



Zv) 



Lemma 26. Assume that A Leb (X) = oo and the density n is positive, 
bounded, and J tt k (x)X (dx) < oo for all < k < 1. Then, for all 
v G (0, 1] the function u i— >■ /i(m, f ) restricted to (0, u) zs differentiable 
and monotonic with the following derivative 



dh 



x [log(7r(y)) - log(7r(«))] 



n u (dy) n u (dz) 
Z{u) Z{u) 



Moreover, lim^o h(u, v ) = and lim u _>„ h(u, v) 



Proof. We will first show that, for any bounded measurable function / 
and u G (0, 1), 



< 65 ) d^ 



/(y)T u (dy) 



/(y)log(7r(y))7r-(di/). 



Since 7r is bounded, there exists a function J : X — > [0, oo) such that 
7r(a;) = cexp(— J(x)). By the dominated convergence theorem, we only 
need to show that \h^ 1 (7T u+h (y) — ir u (y) \ is bounded uniformly for h in 
some neighbourhood of by an integrable function depending only on 
y. We may write 



7r u+h (y) - 7r u (y) 



h 



cexp(-uJ(y)) 



exp(-hJ(y)) - 1| 
\h\ 



Applying the mean value theorem we obtain | exp(— h J(y)) — 1| 
\h\ exp(\h\J(y)). Hence for all \h\ < | we get that 



< 



7T 



\y)-« u {y) 
h 



< cexp 



uJ{y) 



CTX 



u/2 



(y), 
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which concludes the proof of (1B"5]) . 

For any given v, using fl65l) . we compute the partial derivative of h 
with respect to u 



dh 



/«(*(*)) fu<v)f-M [io g My)f U{dy) 



Z{u) J Z{u) J Z{u 

~ f [UAV)) - /■(*(*))] pOg(7T(y)) - lDg(7T(z))] 7r " (<l ' /) 7r " (<l:) 



Z(u) Z{u) 

Since the functions 2; i-> /^(z) and z i-> log(z) are non- decreasing, 

LM^O/)) - /•«( 7r ( 2; ))][ lo g( 7r (l/)) - 1 °g( 7r (^))] > for a11 V>z- Moreover, 
because 7r is positive the (n x 7r)-measure of the sets {y, z G X : ir(y) 7^ 

7r(z)} must be positive due to A Lcb (X) = 00. Hence |^(w,t>) > for 
u E (0,f) and this completes the proof of the first part. 

Since tt u V tt v is integrable, the dominated convergence theorem im- 
plies h(u,v) — > 1 as u — > v. For the second limit consider ( |63|) . For 
any £>0we can find a 5 > such that < e/2. Therefore, 

7r"(dy) < e /^"(dy) , / l {7r>5} (yK(dy) 



_ 2 J Z(u) / ?r u (dy) 

There exists a constant c < 00 such that J t{ 7T> s}(y)'n u (dy) < c for all 
we [0, 1] and 

r n u (dy)> [ 7r"(dy)l {7r < 1} (y) . 



Observe that, for any t/ G X, the function it !->■ 7r"(y)l{ T <i}(i/) is non- 
increasing. Therefore, using the monotone convergence theorem and 
A Leb (X) = 00, we get lim^o / n u (dy) = 00. Hence we can find uq 
such that for all u < uq the normalising constant J ir u (dy) > c/e and 

therefore f fMv))*S$ < e - ° 

Proof of Proposition 7[ We recall that h^(p) depends only on p^ and 
we may write h^'(p) = /i(exp(— exp(p( 1 ^)), 1) — a*, where h is de- 
fined in fl62l) . By ILemma "261 the function u 1— >• h(u,l) is strictly 
monotonic on [0,1] and lim u _>.o h(u, 1) = and lim^-^i 1) = 1. 
Since it — > h(u, 1) is continuous, there exists a unique p^ 1 ) such that 
/ l ( 1 )(p( 1 ),p(2),. ..,p( L - 1 )) = /iW(pW) = 0. We proceed by induction 
and assume the existence of unique p^\ . . . , p^ -1 * 1 such that for any 
k < £ we have h^ k \p^\ . . . , p^) = 0. Denoting 

(66) u(p) = exp(— exp(p)) 

(67) ^-i(p (1;£ - 1} ) = exp ( - $>xp(pW)) , 
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we may write 

We conclude as before from the monotonicity of u y h(u, v) and the 
limits lim^o h(u, v) = and \im u ^ v h(u,v) = 1 (see ILemma 26() . □ 

5.5. Proof of ITheorem 101 Our proof of convergence of the tempera- 
ture adaptation employs classical convergence results on stochastic ap- 
proximation. It is, however, not easy to construct a 'global' Lyapunov 
function for the mean field h, but the specific structure of the problem 
allows to deduce the convergence recursively for . . . ,p( L ~ 1 - ) . 

In this section, for notational simplicity, P(s,/3(p)) = P(s,p) an d 
^(3(p) — For any £ — 1, . . . , L — lwe rewrite (JHJ) as follows 

(68) p(? = n>W x + 7n^ W (P^ l) + 7n,l4 £) + ln,lT^\ , 

where il p is defined in (TjTJJ and 

(69) g^\ P ) = h (uWvt-^-Qlve-ttf 1 *- 1 ))) - a* 

(70) e<? = # ( JL, (X n ) - 7Tp n _ 1 (X n )] 

(71) r<? = ^(ti^^-iCp?^),^!^ 1 ))) 

where fa is defined in fl62|) . if (L-i)( x ) is a shorthand notation for 
#W(p(i^-i) )X ) defined in (USD, u and are defined in (I6"6"|) and fIBTl) . 
respectively, and, by convention, t> = 1. 

We decompose the term e n as the sum of a martingale difference 
term and a remainder term that goes to zero. To do this, we use the 
Poisson decomposition. For (£, p) G .M+(e£, e) x [p,p] defined in 
Section [2j denote by ^ the solution of the Poisson equation 

- P(S,P)^U X ) = <L)(X) - TT,^] , 



which exists by Corollary 2 Hence, e n = 5M n + k„ where 



8MW^H$ n (X n )-P (s 1P )M> (X n _ x ) 
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Lemma 27. Assume (J*U$-(J§M and EV i g (X ) < oo. For all I e 
{1, . . . , L — 1} and T < oo, it holds 



(72) 

(73) 
where 



lim sup 

n ^°° n<k<m(n,T) 



lim sup 



E^ i5M > 



fc 



E 7 ^ 



a.s. 



a.s. , 



m(n, T) = max < j > n : 7^1 < T > . 



i=n+l 



Proof. Consider (1721) . Since are martingale increments, Doob's 

inequality implies 



(74) E sup 



fc>n 
oo 

i=n 



E^> 5M > 



Iff- 



S i-l>Pi-l 



fX, ; ) - P 



Mi) 



Since < 1, Corollary 2 yields 

(75) sup 



H®_Ax)\ <^Vjf(x) 



1 (S,P)' 



(76) 

(s, P )eM^(d,6)x[p,p 
Hence by (1741) we obtain 



™p |P( S ,p)^ >p) WI<^V^(x) 



'1/2/ 



E I sup 

fc>n 



E 



< 



ILemma 2~T1 shows that sup i>0 EV i g (X i ) < 00, and the proof of (172"]) is 
concluded under the step size condition (A|3J) . 

Now consider fl7J). Decompose Et n 7a«f = 
with 



fc-i 



(£,1) dcf 



i=n—l 



E P( S ,p i )^£ >Pi (x i )-P( Si _ llPi _ 1 )^L,p i - 1 ( x ^ 



^n,? - 7n-l,l P (S„_2,P n _ 2 )^sl_ 2 ,p n _ 2 ( X n-l) ~ 7M P (E*-l,P k -i)#:E l Ll,Pk-l( X *) 
fc-1 



<? = E (7.+i,i -^op^-.^o^L^w 

i=n— 1 
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Bv lLemma 22l we get \R^f \ < R { n Xl) + R^ ,h2) where 

oo 



i=n—l 
oo 



(i:£) v 1/2 
P\ Pi_l" V /3 



ft) 
«=n— 1 



Since #w is locally Lipschitz with respect to p, by lLemma 23l we obtain 

||-^ ( (L) - H^.A 1/2 < Kjn . 

" Pi Pi-i" v Po 

ILemma T\\ and (A[3]) imply lim^oo r!£' = a.s. and ILemma 25l 
shows that lim^oo Rn' 1 ' 2 ^ = a.s.. By (1751) . ILemma 211 and (AG5J) , 
the sum £ fc (7 fc) iP (S)fe _ 1)Pfc _ l) H"g_i,P fc _ 1 ( Xfc )) < 00 a - s -' im Plying that 
su Pfc>n l-^nfc^l a - s -- Finally, for the term -R^ fc , (1751) . ILemma 2~T1 
and (A|HJ) conclude the proof. □ 

Proof of \Theorem 1(K For any £ = 1, . . . , L — 1, define the local Lya- 
punov function itfW : R ->• R through 

(77) *>®{p)--={p-p®)\ 

where p^* 1 is the unique solution of the equation g^ l \p) = where 
is defined in (169|) . Observe that, by ILemma 261 and Proposition 7 for 
any £ = 1, . . . , L — 1 the functions w/^ satisfy, for all p G R 

with equality if and only if p = p^. The projection set [p, pi obviously 
satisfies [l8|, A4.3.1] and since for all ^ = 1, . . . , L — 1, p < p^ < p, the 
convergence cannot occur on the boundary of constraint set. Hence 
we only need to check the assumptions of [Tsl . Theorem 6.1.1]. Since 
the mean field gjf} is bounded and continuous, and depends only on p 
the conditions [18|, A6.1.2, A6.1.6, A6.1.7] are satisfied. Condition [18l . 
A6.1.1] is implied by the boundedness of Condition [l8|, A6.1.3] 

is implied by ILemma '271 

(0 

Once we show that the remainder term r„ , defined in (1711) converges 
a.s. to zero as n — > oo, ILemma ~27\ implies the condition |18j, A6.1.4]. 
We will prove inductively that 

(78) lim \r{ e) \ = a.s.. 

n— >oo 

Consider first the case I — 1. By (1681) we get that = 0. Assume 
that for £ < k (1781) holds, then by [18, Theorem 6.1.1] for any £ < k 

we know that p$ — > p^ ■ To simplify notation we denote u n = w(pi fc ' ) ), 
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Vn = v^{p { n k ~ 1] ) and v = ut*)^*- 1 )). By (jS4]h we get that < 2 
for all 2. Hence, using f )63|) . we get 

(r^l < |/i(w n t; n ,t> n ) - ^(w n 6,u n )| + |/i(u»t),i;„) - h(u n v,v)\ 

~7r UnVn (y) 7v Un{j (y)' 



< 



fvMv)) 



Z{u n v n ) Z{u n v) 



Z(v r , 



< 2 
= 2 



7T UnVn {y) (y) 



Z(u n v. 

7T 



Z(u n v) 



Z(u n v n ) 



Z{u n v) 



Ay + 2 
+ 



dy 
dy 



Z(v„) Z(v) 



dy 



TV 



7T 



Z(v) 



TV 



By (9), Lemma A.l] we get that 

|r^| <4{\log(Z(u n v n ))-\og(Z(u n v))\ + \log(Z(v n ))-\og(Z(v))\} , 

where the normalising constants Z are defined in ([2]). Since a 1— > 
log(Z(a)) is locally Lipschitz, u n and v n are in compact set we get that 
exists K < 00 such that 



.(*)! 



< K(l + u r 



v„ - V 



(1-1-1) 



I — y 



By f lT4"|) and f[6"6"j) we obtain that sup n \u n \ < 00. Hence, since p 
v (k) (pt 1 ^- 1 )) i s continuous, by induction assumption, we conclude that 

|rl fe) | ->• a.s.. □ 



Acknowledgements 

The first two authors were supported by the French National Re- 
search Agency under the contract ANR-08-BLAN-0218 "BigMC". The 
third author was supported by the Academy of Finland (project 250575) 
and by the Finnish Academy of Science and Letters, Vilho, Yrjo and 
Kalle Vaisala Foundation. 



References 

[1] C. Andrieu and E. Moulines. On the ergodicity properties of some 
adaptive MCMC algorithms. Ann. Appl. Probab., 16(3):1462- 
1505, 2006. 

[2] C. Andrieu and J. Thorns. A tutorial on adaptive MCMC. Statist. 

Corn-put., 18(4):343-373, 2008. 
[3] Y. Atchade and G. Fort. Limit theorems for some adaptive MCMC 

algorithms with subgeometric kernels. Bernoulli, 16(1):116-154, 

2010. 

[4] Y. F. Atchade and J. S. Rosenthal. On adaptive Markov chain 
Monte Carlo algorithms. Bernoulli, ll(5):815-828, 2005. 



32 BLAZEJ MIASOJEDOW, ERIC MOULINES AND MATTI VIHOLA 

[5] Y. F. Atchade, G. O. Roberts, and J. S. Rosenthal. Towards 
optimal scaling of Metropolis-coupled Markov chain Monte Carlo. 
Statist. Comput., 21(4):555-568, 2011. 

[6] J. D. Banfield and A. E. Raftery. Ice floe identification in satel- 
lite images using mathematical morphology and clustering about 
principal curves. J. Amer. Statist. Assoc., 87:7-16, 1992. 

[7] M. Baragatti, A. Grimaud, and D. Pommeret. Parallel tempering 
with equi-energy moves. Statist. Comput., to appear. 

[8] L. Bornn, P. E. Jacob, P. D. Moral, and A. Doucet. 
An adaptive interacting Wang-Landau algorithm for auto- 
matic density exploration. Technical report, 2011. URL 
|http : //arxiv. org/ abs/ 1109 . 3829v2[ 

[9] R. Douc, E. Moulines, and J. S. Rosenthal. Quantitative bounds 
on convergence of time-inhomogeneous Markov chains. Ann. Appl. 
Prob, 2004:1643-1665, 2002. 
[10] G. Fort, E. Moulines, and P. Priouret. Convergence of adaptive and 
interacting Markov chain Monte Carlo algorithms. Ann. Statist., 
39(6):3262-3289, 2011. 
[11] W. R. Gilks, S. Richardson, and D. J. Spiegelhalter. Markov Chain 
Monte Carlo in Practice. Chapman & Hall/CRC, Boca Raton, 
Florida, 1998. ISBN 0-412-05551-1. 
[12] H. Haario, E. Saksman, and J. Tamminen. An adaptive Metropolis 

algorithm. Bernoulli, 7(2):223-242, 2001. 
[13] U. H. E. Hansmann. Parallel tempering algorithm for conforma- 
tional studies of biological molecules. Chem. Phys. Lett., 281(1-3): 
140-150, 1997. 

[14] D. M. Higdon. Auxiliary variable methods for Markov chain Monte 
Carlo with applications. J. Amer. Statist. Assoc., 93:585-595, 
1997. 

[15] S. F. Jarner and E. Hansen. Geometric ergodicity of Metropolis 

algorithms. Stochastic Process. Appl., 85(2):341-361, 2000. 
[16] D. Kofke. On the acceptance probability of replica-exchange Monte 

Carlo trials. J. Chem. Phys., 117(15):6911-1914, 2002. 
[17] A. Kone and D. Kofke. Selection of temperature intervals for 

parallel-tempering simulations. J. Chem. Phys., 122, 2005. 
[18] H. Kushner and G. Yin. Stochastic Approximation and Recursive 

Algorithms and Applications. Springer, 2003. 
[19] F. Liang and W. H. Wong. Evolutionary Monte Carlo for protein 

folding simulations. J. Chem. Phys., 115(7), 2001. 
[20] E. Marinari and G. Parisi. Simulated tempering: a new Monte 

Carlo scheme. Europhys. Lett, 19(6):451-458, 1992. 
[21] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, 

and E. Teller. Equations of state calculations by fast computing 

machines. J. Chem. Phys., 21(6):1087-1092, 1953. 



ADAPTIVE PARALLEL TEMPERING ALGORITHM 



33 



[22] S. Meyn and R. L. Tweedie. Markov Chains and Stochastic Stabil- 
ity. Cambridge University Press, New York, NY, USA, 2nd edition, 
2009. ISBN 0521731828, 9780521731829. 

[23] S. P. Meyn and R. L. Tweedie. Computable bounds for geometric 
convergence rates of Markov chains. Ann. Appl. Probab., 4(4): 
981-1011, 1994. 

[24] C. P. Robert and G. Casella. Monte Carlo Statistical Methods. 

Springer- Verlag, New York, 1999. ISBN 0-387-98707-X. 
[25] G. O. Roberts and J. S. Rosenthal. Optimal scaling for various 

Metropolis-Hastings algorithms. Statist. Sci., 16(4):351-367, 2001. 
[26] G. O. Roberts and J. S. Rosenthal. Examples of adaptive MCMC. 

J. Comput. Graph. Statist, 18(2):349-367, 2009. 
[27] G. O. Roberts and J. S. Rosenthal. Minimising MCMC 

variance via diffusion limits, with an application to 

simulated tempering. Technical report, 2012. URL 

http : //probability . ca/jeff /research. html . 
[28] G. O. Roberts, A. Gelman, and W. R. Gilks. Weak convergence 

and optimal scaling of random walk Metropolis algorithms. Ann. 

Appl. Probab., 7(1): 110-120, 1997. 
[29] E. Saksman and M. Vihola. On the ergodicity of the adaptive 

Metropolis algorithm on unbounded domains. Ann. Appl. Probab., 

20(6):2178-2203, 2010. 
[30] R. H. Swendsen and J.-S. Wang. Replica Monte Carlo simulation 

of spin-glasses. Phys. Rev. Lett, 57(21):2607-2609, 1986. 
[31] M. Vihola. On the stability and ergodicity of adaptive scaling 

Metropolis algorithms. Stochastic Process. Appl., 121(12):2839— 

2860, 2011. 

[32] M. Vihola. Can the adaptive Metropolis algorithm collapse with- 
out the covariance lower bound? Electron. J. Probab., 16:45-75, 
2011. 

[33] M. Vihola. Robust adaptive Metropolis algorithm with coerced 
acceptance rate. Statist. Comput., 2012. to appear. 

[34] D. B. Woodard, S. C. Schmidler, and M. Huber. Conditions for 
rapid mixing of parallel and simulated tempering on multimodal 
distributions. Ann. Appl. Probab., 19(2):617-640, 2009. 

Blazej Miasojedow, LTCI, CNRS UMR 8151, Institut Mines-Telecom/Telecom 
ParisTech, 46, rue Barrault, 75634 Paris Cedex 13, France 

Eric Moulines, LTCI, CNRS UMR 8151, Institut Mines-Telecom/Telecom 
ParisTech, 46, rue Barrault, 75634 Paris Cedex 13, France 

Matti Vihola, Department of Mathematics and Statistics, P.O.Box 
35, FI-40014 University of Jyvaskyla, Finland 



